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ABSTRACT 

The capability of the DIMTEST statistical test to 
assess essential dimensionality of the model underlying item 
responses of real tests as opposed to simulated tests was 
investigated. A variety of real test data from difference sources was 
used to assess essential dimensionality. Based on DIMTEST results, 
some test data are assessed as fitting an essential unidimens i onal 
model, while others are not. Essential unidimens i onal test data, as 
assessed by DIMTEST, are then combined to form two-dimensional test 
data. The power of Stout *s statistic T is examined for the 
two-dimensional data. It is shown that the results of DIMTEST on real 
tests replicate findings from simulated tests in that the statistic T 
discriminates well between essential unidimens ional and 
multidimensional tests and is also highly sensitive to major 
abilities while being insensitive to relatively minor abilities 
influencing item responses. Five tables present analysis results, and 
38 references are included. (Author/SLD) 
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Assessing Essential Dimensionality of Real Data 



Abstract 



The purpose of this article is to validate the capability of DIMTEST to assess 
essential dimensionality of the model underlying the item responses of real tests as opposed 
to simulated tests. A variety of real test data from different sources are used to assess 
essential dimensionality. Based on DIMTEST results, some test data are assessed as fitting 
an essential unidimensional model while others are not. Essential unidimensional test data, 
as assesse ' V DIMTEST, are then combined to form two-dimensional test data. The 
power of Stout's statistic T is examined for these two-dimensional data. It is shown that 
the results of DIMTEST on real tests replicate findings from simulated tests in that the 
statistic T discriminates well between essential unidimensional and multidimensional tests. 
It is also highly sensitive to major abilities while being insensitive to relatively minor 
abilities influencing item responses. 

Subject terms: DIMTEST, essential independence, essential dimensionality, 
unidimensionaiity, multidimensionality, item response theory. 
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Most of the currently used item response theory (IRT) models require the assumption 
of unidimensionality. From the strict IRT perspective, unidimensionality refers to one, and 
only one, trait underlying test items. Yet, it is a well known fact that items are multiply 
determined (Humphreys, 1981, 1985, 1986; Hambleton & Swaminathan, 1985, chap. 2; 
Reckase, 1979, 1985; Stout, 1987; Traub, 1983). Hence from the substantive viewpoint, the 
assumption of unidimensionality requires that the test items measure one dominant trait. 
Stout (1987) coined the term essential unidimensionality to refer to a particular 
mathematical formulation of a test having exactly one dominant trait. Dimensionality is, 
however, determined by the joint influence of test items and examinees taking the test 
(Reckase, 1990). In addition, extraneous factors such as teaching methods, anxiety level of 
examinees, etc., may also influence the dimensionality of the given item response data. 
Thus dimensionality has to be assessed each time a test is administered to a new group of 
examinees. 

Factor analysis has traditionally been the most popular approach to assess 
dimensionality (Hambleton & Traub, 1973; Lumsden 1961). Factor analysis, despite its 
serious limitations to analyze dichotomous data (for example, see Hulin, Drasgow, and 
Parsons, 1983, chap. 8), has been the popular method to study the robustness of the 
unidimensionality assumption (Drasgow & Parsons 1983; Harrison, 1986; Reckase, 1979). 
There are a number of other promising methods proposed and used in varying degrees to 
assess dimensionality — ^to name a few: full information factor analysis based on the 
principle of marginal maximum likelihood (Bock, Gibbons, & Muraki, 1985; TESTFACT: 
Wilson, Wood, & Gibbons, 1983); nonlinear factor analysis (McDonald, 1962; McDonald & 
Ahlawat, 1974; Jamshid & McDonald, 1983); Holland and Rosenbaum's (1986) test of 
unidimensionality, monotonicity and conditional independence based on contingency 
tables; Tucker and Humphreys' methods based on the principle of local independence and 
second factor loadings (Roznowski, Tucker, & Humphreys, 1991); and Stout's (1987) 
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statistical procedure based on essential independence and essential dimensionality. Hattie 
(1984, 1985) has provided a comprehensive review of traditional approaches to assess 
dimensionality, and Zwick (1987) has applied some of the above mentioned recent 
procedures to assess dimensionality of National Assessment of Educational Progress data. 
Despite having several procedures available to assess dimensionaUty, there is no widespread 
consensus among substantive researchers for a preference for any method(s), and often 
there is dissatisfaction about assessing dimensionality (Berger & Knol, 1990; Hambleton & 
Rovinelli, 1986; Hattie, 1985). 

Stout (1987) proposed a statistical cest (DIMTEST) to assess essential 
unidimensionality of the latent space underlying a set of items. Nandakumar (1987) and 
Nandakumar and Stout (in press) have further modified, refined, and validated DIMTEST 
for assessing essential dimensionality on a variety of simulated tests. This article 
demonstrates the validity and usefulness of Stout's procedure on a variety of real, as 
opposed to simulated, tests. Test data £rom different sources are collected and used to 
assess essential unidimensionality. Essential unidimensional data are then combined to 
form two--dimen8ional data. The power of Stout's statistic T is examined for these 
two-dimensional data. 

DIMTEST for Assessing Essential Unidimensionality 

DIMTEST, a statistical test for assessing unidimensionality, is based on the theory of 
essential dimensionality and essential independence (Stout, 1987, 1990). An item pool is 
said to be essentially independent with respect to the latent trait vector Q, if, for a given 
initial segment of the item pool, the average absolute conditional (on covariances oi 
item pairs approaches zero as the length of the segment increases. When only one dominant 
ability 0 meets the essential independence assumption, the item pool is said to be 
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essentially unidimensional. In contrast, the assumption of local independence niqmres the 
conditional co^ariances to be zero for all item pairs in question. The number of abilities 
required to satisfy the local independence assumption is the dimensionality of the test. 
While the traditional definition of dimensionality (Lord & Novick, 1968) counts all abilities 
required to respond to test items correctly to satisfy the assumption of local independence, 
essential dimensionality counts only dominant abilities required to satisfy the assumption 
of essential independence (as opposed to local independence). DIMTEST, using this 
definition, assesses the closeness of approximation of the model generating the given item 
responses to the essential unidimensional model. Nandakumar (1991) describes the 
theoretical differences between traditional dimensionality and essential dimensionality and 
establishes through Monte Carlo studies the usefuhiess of DIMTEST for assessing essential 
unidimensionality in the possible presence of several secondary dimensions. 

To use DIMTEST for assessing essential unidimensionality, it is assumed that a 
group of /examinees take an iVitem test. Each examinee j)roduces a vector of responses of 
Is and Os, with 1 denoting a correct response and 0 denoting an incorrect response. It is 
assumed that essential independence with respect to some dominant ability 0 holds and 
that the item response functions are monotonic with respect to the same vector 0. The 
hypothesis is stated as follows: 

H : drp = 1 versus H : drp > 1 

where denotes the essential dimensionality of the latent space underlying a set of items. 

In order to assess essential unidimensionality of a given test data, DIMTEST follows 
several steps. The steps are summarized briefly here (for details see Stout 1987; 
Nandakumar & Stout, in press). First, test items are split into three subtests ATI, AT2, 
and PT with the aid of factor analysis (FA) using part of the sample (a sample size of 500 
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is recommended for this purpose). Items of ATI are selected so that they all tap the same 
dominant ability. Instead of using FA, it is also possible to use expert opinion (EO) to 
select items for ATI. If the FA method of selection is chosen, DIMTEST automatically 
determines the length of the subtest ATI. Once items for ATI are chosen, items of AT2 
are selected so that they have a difficulty distribution similar to those of ATI items (for 
details see Stout, 1987). The remaining items form the partitioning subtest PT. 

Second, examinees are assigned to K different subgroups ba£ed on their score on the 
partitioning subtest PT. In other words, all examinees obtaining the same PT total score 
are assigned to the same subgroup. When the subtest PT is "long" and the test is 
essentially unidimensional, within each subgroup fc, examinees are assumed to be 
approximately of similar ability. When PT is not long, the subtest AT2 compensates for 
the bias in ATI caused by PT being short. Also, AT2 compensates for the bias in ATI 
caused by the presence of guessing or the difficulty factor that is often found by the factor 
analysis. 

^2 "^2 

Third, within each subgroup k, variance estimates, aj^ and and the standard 
error of estimate Sj^ are computed using item responses of ATI. These estimates are then 
simimed across K subgroups to obtain 



'2 ' 2 



Similarly, Tj^ is computed using items of subtest AT2. Stout's statistic Tis given by 
T = (T^-T^/[^. 



The decision rule is to reject H^ii T> Z^, where is the upper lOO(l-a) percentile of the 
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standard normal distribution, a being the desired level of significance. 

When the given test data are well modeled by an essential unidimensional model, 
items of ATI, AT2, and PT would all be tapping the same dominant dimension. Therefore, 

A A 

the variance estimates a| and cr ^ ^ will be approximately equal resulting iu a "small" 
T-value, suggesting the tenability of H^. On the other hand, when the test data is not well 
modeled by an essential unidimensional model, the variance estimate will be much 
larger than cr^^ resulting in a "large" T-value leading io the rejection of H^. 

Simulation studies (Stout, 1987; Nandakumar, 1987; Nandakumar & Stout in press) 
on a wide variety of tests have demonstrated the utility of DIMTEST in discriminating 
between one- and two-dimensional tests. Simulation studies by Nandakumar (1991) have 
particularly demonstrated the usefulness of DIMTEST in assessing essential 
imidimensionality with the aid of a rough index of deviation from essential 
xmidimensionality. The tests in Nandakumar (1991) were modeled by two- and 
higher-dimensional IRT models as opposed to a one-dimensional model, and the test items 
were influenced by major and secondary abilities to varying degrees. For some tests, the 
secondary ability or abilities influenced a high proportion of items, and for others the 
secondary ability or abilities influenced only a small proportion of items. It has been shown 
that DIMTEST reliably accepts the hypothesis of essential unidimensionality, provided the 
model generating the test is close to the essential unidimensional model: established when 
each of the secondary abilities influences relatively few items, or if secondary abilities are 
influencing many items, the degree of influence on each item is small. The type-I error in 
these cases was within tolerance of nominal level. As the degree of influence of the 
secondary abilities increases, however, the approximation to an essential unidimensional 
model degenerates, inflating the observed type~I error of the hypothesis of essential 
unidimensionality. Simulation results (Stout, 1987; Nandakumar and Stout, in press) have 
particularly demonstrated the excellent power of the statistic T when the model generating 




ASSESSING ESSENTIAL D:MENSI0NALITY-« 



the item responses is twcMiimensional (two major abilities) with correlation between 
abilities as high as .7 and items jointly influenced by both abilities. 

Description of Data 

The data sets used in the present study came from different sources. The U.S. history 
and literature data for grade 11/age 17, from the 1986 National Assessment of Educational 
Progress (NAEP, 1988) test data, were obtained from Educational Testing Service (ETS). 
The General Science data. Arithmetic Reasoning data, and Auto Shop Information data for 
grades 10 and 12, from the Armed Services Vocational and Aptitude Battery (ASVAB) 
test data, were obtained from Linn, Hastings, Hu, and Ryan (1987). The Mathematics 
Usage test data, the science test data, and the reading te&t data were obtained from 
American College Testing program (ACT). 

The NAEP achievement tests are part of the so called Balanced Incomplete Block 
(BIB) design with spiraled administration (Rogers et al., 1988) which allows the study of 
interrelationships among aU items within a subject area. Because the U.S. history and 
literature tests fall into the simplest category of BIB design, it was relatively easy to 
gather the response data for ail examinees taking these tests. Hence, these tests were 
chosen for the present study. The items in each area (history and literature) were divided 
into four "parallel" blocks with approximately the same number of items. One block of 
items out of four was randomly selected in each case for the present study. 

The U.S. history test data (HIST-A) with 36 items consists of items requiring 
knowledge from different time periods of U.S. history: Colonization to 1763; the 
Revolutionary War and the New Republic, 1763-1815; Civil War, 1815-1877; the rise of 
modern America, World War 1 1877-1920; the Depression, Worid War H, 1920-1945; 
Post-World War II, 1945-to the present; and map items requiring the knowledge of 
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geographical location of different countries in the world. A 31--item subtest of HIST-A, 
named HIST was created (explained in detail in the next section) consisting of all the items 
of HIST--A, except the five map items. There are 2428 examinees in the HIST-A and HIST 
samples. 

The literature test data (LIT) with 30 items consists of items requiring knowledge 
within four literary genres: novels, short stories, and plays; myths, epics, and Biblical 
characters and stories; poetry; and nonfiction. There are 2439 examinees in the LIT sample. 

The ASV AB tests are used by the Department of Defense Student Testing Program 
in high schools and post secondary schools. The Arithmetic Reasoning test data for grades 
10 and 12, with 30 items each, consists of items requiring knowledge in solving arithmetic 
word problems. The arithmetic reasoning test sample for grade 10 (ARIO) has 1984 
examinees, and for grade 12 (Ara2) has 1961 examinees. The Auto and Shop Information 
test data for grades 10 and 12, with 25 items, each consists of items requiring knowledge of 
automobile, tools, and shop terminology and practices. The auto shop test sample for grade 
10 (ASIO) has 1981 examinees, and for grade 12 (AS12) has 1974 examinees. The General 
Science test data for grades 10 and 12^ with 25 items sach, consists of items requiring 
knowledge in solving high school level physical, life, and earth sciences. There are 1990 
examinees in the general science test sample for grade 10 (GSIO) and 1990 examinees in the 
general scien e grade 12 (GS12) sample. 

The ACT mathematics usage test data (MATH) with 40 items consists of items 
requiring knowledge in solving different types of mathematics problems: arithmetic and 
algebra operations, geometry, numeration, story problems, and advanced topics. There are 
2491 examinees in the MATH sample. 

The ACT reading test data (READ~A) with 40 items consists of 4 passages, each 
followed by 10 questions. The first three passages are taken from different books all dealing 
with humanities, and the last passage is taken firom a book about psychology. The first 



11 



ASSESSING ESSENTIAL DIMENSIONALITY-10 



passage came irom Of the Farm by Jolm Updike. The second passage came from Li ght and 
Color in Nature and Art by Samuel Williamson and Herman Cummins, The third passage 
came from Theatre: the Dynamics of the Art bj- Brian Hansen. And the fourth passage 
came from Toward a Psychology of Being by Abraham Maslow. A 30-4tem subset of 
READ-A named READ was created (details in the next section) consisting of the first 30 
items of READ-A. There are 5000 examinees in the READ-A and READ samples. 

The ACT science test data (SCI-A) with 40 items consists of 7 passages, each 
followed by 5 to 7 questions. The first passage dealt with the effect of the thymus gland on 
the deyelopment of immune system in mice» The second passage dealt with sub-surface 
ground water moyement and its effects for waste disposal. The third passage dealt with the 
periods of the pendulum on the earth and the moon and its relationship to the string length 
and mass of the ball. The fourth passage dealt with the environmental impact of effluent. 
The fifth passage dealt with a bimetallic caialyst and its relationship to the speed of 
certain chemical reactions. The sixth passage dealt with the views of two paleontologists on 
the characteristics of dinosaurs. And the seventh passage dealt with the principals of 
osmosis and osmotic characteristics of 3 categories of organisms. A 28-item subset of 
SCI-A named SCI was created (explained in the next section) consisting of the first 28 
items of SCI-A. There are 5000 examinees in SCI-A and SCI samples. 

In addition, in order to examine the effect of sample size on DIMTEST, both SCI 'and 
READ are randomly split into four mutually exclusive data sets. The READ is split into 
READl, READ2, READ3, and READ4— with 750, lOOO, 1250 and 2000 examinees, 
respectively. Similarly SCI is split into SCIl, SCI2, SCI3, and SCI4— with 750, 1000, 1250, 
and 2000 examinees, respectively. In all there are 22 test data. These are listed along with 
the test size and sample size in the first three columns of Tables 1 and 2. 
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Creation of Two-DiiiLeiisioiial Test Data 

Three different sets of two-dimensional test data from the content perspective were 
created by combining responses from test datu. that were assessed as essentially 
unidimensional by DIMTEST in the present study. 

The two-dimensional test data, RS, was created by combining responses of 30 items 
of READ with the responses of 6 items of SCI forming a 36-4tem test with 5000 examinees. 
The 6 items of SCI are part of one of the passages randomly selected from its 5 passages. 
Just as in the unidimensioiial case of READ and SCI, RS is then randomly split into 4 
mutuaUy exclusive data sets RSI, RS2, RS3, and RS4— with 750, 1000, 1250 and 2000 
examinees, respectively. These tests are listed along with their test sizes and sample sizes 
in the first four columns of Table 3. 

The two-Kiimensional test data ARGSl, for Grade 10, was created by combining the 
responses of 30 items horn ARID with the responses of 5 items (randomly selected from 25 
item responses) from GSIO. Similarly, ARGS2 was created by combining the responses of 

30 items from ARID with the responses of 10 items from GSIO. The two-dimensional test 
data GSARl, for gradel2, was created by combining the responses of 25 items from GS12 
with the responses of 5 items fcom AR12; and GSAR2 was created by combining the 
responses of 25 items horn GS12 with responses of 10 items from AR12. These test data are 
listed along with their test sizes and sample sizes in the first four columns of Table 4. 

The two-dimensional test data HSTLITl was created by combining the responses of 

31 items from HIST with the responses of 5 items (randomly selected from 30 item 
responses) from LIT. Similarly HSTLIT2 and HSTLIT3 were created by combining the 
responses of 31 items from HIST with the responses of 8 and 10 items, randomly selected, 
from LIT respectively. These test data are listed along with their test sizes and sample 
sizes in the first four columns of Table 5. 
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Results 

Unidimenfflonal Studies 

All the tests in Table 1, except HIST, READ, and SCI (which are derived subtests of 
HIST-A, READ-A, and SCI-A, respectively as described below), were initially tested for 
essential unidimensionality using DIMTEST. In each case, 500 examinees were randonily 
selected ^:om the given pool for the use of selecting A-Tl items, using factor analysis. The 
rest of the items were used for computing Stout's statistic T. The size of ATI (M) was also 
determined by DIMTEST. For each test, the T-value and the p-value are noted. Table 1 
lists the T- and p-values for aU tests in the fourth and fifth columns. The method of 
selection of the ATI subtest, the value of Af, and item numbers selected for ATI are listed 
in the last three columns of Table 1. 



Table 1 about here 



It can be seen from Table 1 that the p-values associated with test data LIT, ARIO, 
AR12, GSIO, and GS12 are well above the nominal level of significance (Qf=.05), thereby 
strongly affirming essential unidimensional nature of these tests. That is, the underlying 
model generating the test data is judged essentially unidimensional. However, the p-values 
associated with HIST-A, ASlO, AS12, MATH, READ-A, and SCI-A are well below the 
nominal level of significance of .05, thereby strongly affirming the multidimensional nature 
of these test data. For these tests where p~values were below the nominal level, the nature 
of multidimensionality was further explored. 

H 
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When the test data are essentially unidimensional, items of ATI are, by logic, of the 
same dominant dimension as the rest of the items; therefore, DIMTEST does not reject the 
nnll hypothesis. When the test data is not unidimensional, however, the items of ATI are 
dimensionally different from the rest of the items, and DIMTEST rejects the null 
hypothesis of essential unidimensionality. Following this reasoning for tests where ]>-values 
were very low, the content of items of ATI were examined. Table 1 shows that for 
HIST-A, items 12 through 16 and item 6 were selected for ATI. Upon studying the content 
of these items, it was found that items 12 through 16 were homogeneous and differed 
dimensionally from the rest of the items of HIST-A; these 5 items require the knowledge of 
location of different countries on the world map (map items), while the rest of the items 
deal with U.S. history. It is also possible in theory that these items were selected for ATI 
due to chance alone. In order to test for this, DIMTEST was applied on the given sample of 
2428 examinees 100 times repeatedly, each time randomly splitting 2428 examinees into 
two groups of 500 and 1928 examinees. That is, ATI items were selected repeatedly on 
differCTit random samples of 500 examinees each. The resampling results showed that items 
12 through 16 were consistently selected for ATI. In addition to these items one or two 
more items, which varied from run to run, were selected from the rest of the items. Hence 
it was concluded that the map items are dimensionally different from the rest. A subset 
HIST was formed consisting of all items of HIST-A except for map items. It can be seen 
from Table 1 that the p-value associated with HIST (p=.095) shows evidence of essential 
unidimensionality. Furthermore, from the content perspective, items of ATI do not form a 
set that is dimensionally different from the rest of the items of HIST. 

A similar phenomenon was observed with test data READ-A and SCI-A. For 
READ-A, the last 10 items (items followed by the last passage) formed part of subtest 
ATI. Again these same 10 items formed part of ATI in repeated resampling applications of 
DIMTEST. Upon studying the content of these items, it was found that these 10 items 
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tapped "psychology" content area which is different from the "literature," tapped by the 
first three passages. Another possibility is that, since these are the last 10 items of reading 
test, speededness could have caused the secondary dimension. Based on these observations, 
it was concluded that these items were dimensionally different from the rest, and a subset 
READ was formed consisting of first 30 items of READ-A. It can be seen from Table 1 
that the p-value associated with READ (p=:.32) shows strong evidence of an essential 
unidimensional model underlying the test items. In addition, items of ATI now come from 
aU the passages of READ. 

For test data SCI-A, the 12 items following the last two passages formed part of 
ATI. Just as in HIST-A and READ-A, after resampling application of DIMTEST, these 
items were removed. The resulting subtest SCI with the first 28 items was still found to be 
multidimensional (p=.002). Thus, a unidimensional subset could not be formed. Unlike 
reading test items, science test items come from distinctly different content areas, with a 
moderate correlation among content areas, and require a higher level of abstract reasoning 
and analytical skills than the reading items. Thus, in addition to content areas, difficulty 
or speededness could have caused major secondary dimensions in this case. 

For the test data MATH, ASIC, and AS12, where p-values were low, items of ATI 
did not form a subgroup tapping a secondary ability as found in HIST-A, READ-A, or 
SCI-A. In addition upon studying the content of the items, it was found these items tap 
multiple major content areas. Therefore these test data are treated as multidimensional. 



Table 2 about here 



Table 2 shows dimensionality results of the unidimensional READ and 
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multidimensional SCI test data for different sample sizes. The j?-values associated with 
READl through READ4 show evidence of a high degree of essential unidimensionality 
underlying the test data. These results are consistent with that of READ in Table 1. The 
selection of items of ATI for tests READl through READ4 are highly varied, and yet they 
consistently affirm essential unidimensionality. The results of SCIl through SCI4 are 
consistent with that of SCI in Table 1 in affirming multidimensionality of the test data. 
Items of ATI varied highly for all four tests and yet consistently affirmed 
multidimensionality, except for SCI3. 

Two-dimensional Studies 

Results of two-dimensional reading and science test data are reported in Table 3. 
Since items that tap a distinct second dimension, from the content perspective, are clearly 
known (in this case, 6 SCI items), the science items were forced to be selected for ATI. 
This is an example where expert opinion is used to select ATI items. The T- and p-values 
for RSI, RS2, RS3, RS4, and RS strongly confirm the two-dimensional nature of these test 
data. As expected, as the sample size increases, the power also increases. 



Table 3, Table 4 and Table 5 about here 



The results of the two-dimensional test data of ARCS an. GSAR are reported in 
Table 4. Also in this case, since items that are used to create these two-dimensional data 
are known (GS items for ARCS and AR items for GSAR), these items were forced to be 
selected for ATI. The T- and jHvalues associated with all the four tests strongly confirm 



17 



ASSESSING ESSENTIAL DIMENSION ALITY-16 



the multidimensionality of these test data. For ARGSl and ARGS2, there is a sharp 
increase in T- and p-values as the degree of contamination, as measured by the number of 
item responses contaminated, increases from 5 to 10. 

The results of the two-dimensional history and literature test data are reported in 
Table 5. As with other two-dimensional tests, LIT items were forced to be selected for 
ATI. Also in this case, the T- and ^--values confirm the multidimensional nature of these 
data. 

DIMTEST was again applied to a sample of test data selected from two-dimensional 
tests. This time FA was used as the method of selection for ATI items. The purpose of this 
analysis was to check if the FA method of selection of ATI items would lead to the similar 
p-values as with EO. The findings revealed that for these tests FA could not always ferret 
out purely unidimensional items fi:om content perspective. The subtest ATI had a mixture 
of items tapping both dimensions, and DIMTEST was then able to correctly assess 
dimensionality only when there were 1000 or more examinees for computing the statistic. 

Discussion and Condusions 

None of the tests examined in the present study are strictly unidimensional in the 
sense of measuring only one ability. Items, in every test, are influenced by several 
secondary abilities in addition to the major ability intended to be measured. Based on 
DIMTEST analysis, some test data were assessed as fitting an essential unidimensional 
model while others were not. This depends upon whether the secondary abilities were major 
or minor. 

The unidimensionality analysis of HIST-A, READ-A, and SCI-A present interesting 
findings. For HIST-A, the map items had high second factor loadings and thus were 
selected for ATI. Consequently, the computed T-statistic wa''; large, leading to the 
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rejection of and implying that ATI items are dimensionally different from the rest of 
the test. Content analysis of HIST-A reveals that HIST-A consists of items of United 
States history for different time periods spanning from 1763 to present time. These items 
cover such a large span of time that the test is surely slightly multidimensional for this 
reason alone. In addition, the test contains map items. The map items, however, were 
isolated and statistically confirmed as not measuring the same trait as the rest of the test. 
This shows that the statistic T is highly sensitive to distinct major dimensions (in this 
case, map items). The analysis of HIST, with map items removed, reveals that it is 
essentially unidimensional. Thus the statistic T seems to be robust against relatively minor 
correlated abilities influencing test items while being sensitive to major abilities. Likewise, 
for the test data READ-A, multidimensionality was caused by items tapping psychology 
topic (scientific) versus literature topics (humanities). Once the psychology item responses 
were removed, the remaining item responses could be well modeled by an essential 
unidimensional model. In contrast, the multidimensionality in SCI-A was due to not only 
distinct major abilities but also likely due to speededness of the test, which in itself is a 
major determinant. Moreover, an essential unidimensional subtest could not be formed for 
SCI-A. 

Another interesting feature of these analyses is that although both READ and SCI 
are paragraph comprehension type test data, they differ widely in the degree of their 
approximation to essential dimensionality. The READ test data has 3 passages each 
followed by 10 items, aU dealing with humanities. Although these passages come from 
different sources, the model underlying the item responses approximates an essential 
unidimensional model. This is an example where a few secondary abilities (possibly highly 
correlated) each influence a large group of items. In contrast, the SCI test data has 5 
passages each followed by 5 or 6 items. These passages, although they deal with science in 
general, come from widely different and conceptually difficult topics, and the model 
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underlying the item responses does not approximate an essential uni dimensional model. 
This is an example where many secondary abilities each influence a small groups of items, 
but the strength of the influence of these secondary abilities is such that item responses can 
not be well modeled by an essential unidiraensional model. These results are consistent 
with simulation results of Nandakumar (1991) in that the number of iten h- x-uenced by 
secondary abilities and the strength of the secondary abilities present determine the degree 
to which the assumption of essential unidimensionality is violated. 

The results obtained in this study are similar to the results obtained by other 
researchers who have analyzed some of these data using different statistical methodologies. 
Zwick (1987) performed dimensionality analyses of HIST-A and LIT by various techniques 
to assess dimensionality and concluded that these are unidimensionai. Regarding the ACT 
data, it is believed that MATH and SCI are multidimensional. Bock, Gibbons, and Muraki 
(1985) have analyzed ASVAB test data for a different sample and found a significant 
second factor for arithmetic reasoning, general science, and auto shop information. Since 
the sample used here is not the same it is hard to develop a meaningful comparison. 

The results of two-dimensional tests demonstrate a very good power of the statistic 
T. The statistic T has the capability to ignore minor secondary traits, which should be 
largely discounted, from the major dominant traits. This is evidenced in several cases. The 
test data HIST illustrates this. There is inherent multidimensionality in HIST as it covers 
a range of time periods in history. However, the p-value is above the nominal level of 
significance, suggesting acceptance of unidimensionality. By contrast, with the additional 
contamination of only 5 LIT items or 5 map items, the T-value shoots up, indicating 
essential multidimensionality of the data. This remarkable sensitivity of the statistic T to 
major dimensions illustrates its power. 

These results, for the first time, have illustrated both the factor analysis approach 
and the expert opinion approach to select items for the subtest ATI. Tables 1 and 2 use FA 
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to select ATI items, and Tables 3, 4, and 5 use EO. It is evident that FA serves as an 
exploratory tool and EO serves as a confirmatory tool in selecting items for ATI to assess 
essential dimensionality. 

The dimensionality of a given set of item responses in certain sense is a 
continuum — one cannot determine whether a given data of responses generated by a set of 
items to an examinee sample is truly essentially unidimensional or truly multidimensional; 
one can only approximate. Although the exact number of dimensions in an IRT model is 
rigorously defined for a finite length test, the number of dominant dimensions — ^whether 
determined by Stout's essential dimensionality conceptualization or by some other 
conceptualization — is only rigorously definable for an infinitely long test. In other words, 
for a finite test (that is, for any real test data) it is a judgment call whether a particular 
IRT model is seen as having one, or more than one, donainant dimension, based upon where 
on the continuum the amount of multidimensionality falls. One consequence of this is that 
the performance of ability estimation procedures such as LOGIST or BILOG needs to be 
addressed in the context of the assessment of the amount of lack of unidimensionality. In 
this regard, indices of lack of essential unidimensionality developed by Junker and Stout 
(1991) will be extremely useful. These indices can be used to decide when it is safe to use 
unidimensional estimation procedures such as LOGIST and BILOG to arrive at accurate 
estimates of ability. 

In cases where approximation of essential unidimensional model to the data is in 
question, there are various alternatives. The test items can be split into essential 
unidimensional subtests (for example, HIST-A and READ-A). Another possible approach 
is to investigate the applicability of the concept of "testlet" to the data (Hosenbaum, 1988; 
Thissen, Steinberg, and Mooney, 1989). If the assumption of local independence is violated 
within the passages but maintained among the passages, the theory of testlets promises 
unidimensional scoring for such tests. The test data SCI-A and SCI could fall into this 
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category. Multidimensional modeling can be applied if either of the above procedures can 
not be applied (Reckase, 1989). 
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ATI items can be selected by using factor analysis (FA) or by expert 
opinion (EG) . 

M is the size of ATI 
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Table 3 

Results of H^i dg=l for two-disaensional tests: 
READ & SCI; a=.05 
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= 1 for two-dimensional tests: 
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Table 5 

Results of H^: d^=l for two-dimensional tests: 
HIST t LIT; a=.05 
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